Whole exome sequencing analysis pipeline for the discovery of mutations causative of human rare diseases

نویسندگان

  • Francisco Javier Lopez Domingo
  • Antonio Rueda Martin
  • J. P. Florido
  • Alicia Vela Boza
  • Pablo Arce Garcia
  • Luis Miguel Cruz Renedo
  • Javier Escalante
  • Ana Isabel Lopez Perez
  • Federica Trombetta
  • Guillermo Antinolo
  • Javier Santoyo
چکیده

Recent advances in high-throughput sequencing technologies have made exome sequencing to be an outstanding tool for finding disease associated mutations at a relatively low cost. However, it is a non-trivial task to transform the vast amount of sequence data into meaningful variants to improve disease understanding. Several challenges arise when dealing with this approach, being critical checkpoints the raw read preprocessing, mapping procedure, variant calling and posterior variant selection. A number of computational algorithms and pipelines have been reported for variant analysis, none of them providing a complete strategy from raw data to mendelian analysis results. Here, we present a methodology that spans from SOLiD raw reads processing to mendelian analysis and variant selection, and its application over a set of samples from The Medical Genome Project, which proves the good performance of the applied methodology. As stated above, the input of the pipeline is an xsq file generated by Applied Biosystem SOLiD 5500 XL sequencers, while the output is the result of variant annotation and mendelian analysis, assuming samples to be derived from a group or a family. A brief description of the steps is provided below: 1. Fasta and qual files generation from xsq files. 2. Duplicated reads removal. 3. BLAT-like Fast Accurate Search Tool v0.7.0a (BFAST) [1] for read mapping. 4. BAM cleaning: duplicated alignments and mismatched reads removal. 5. BAM realignment and SNV calling using the Genome Analysis Toolkit v1.4.14 (GATK) [2]. 6. Variant quality filter based on GATK Best Practices V3 and depth filter. IWBBIO 2013. Proceedings Granada, 18-20 March, 2013 253 2 F.J.López-Domingo et al. 7. Annotate Variation package (ANNOVAR) for variant annotation [3]; SIFT [4] and Polyphen [5] for variant function impact prediction; 1000 genomes [6] and dbSNP [7] for assessment of variant frequency. 8. Mendelian filter of deleterious variants. The Medical Genome Project (MGP) aims to characterize a large number of rare genetically-based diseases. As a proof of concept, we selected from the MGP a set of affected individuals by several hereditary rare diseases, their healthy relatives and a set of 50 control healthy individuals from Spanish population. The full methodology was run and the results reveal a number of deleterious haplotypes in several genes which could be directly associated with the diseases. The validation of some of the predicted variants by the pipeline demonstrates the good performance of our methodology analysis. Critical aspects to achieve such good performance are (i) BAM filtering, since an excessive number of mismatches are allowed by BFAST for short reads; (ii) the selection of variant filters and quality thresholds as recommended by GATK Best Practices V3 in combination with a depth threshold allowing high quality calls and (iii) the inclusion of control individuals in the analysis, which is essential since they remove population variants which can disturb the interpretation of the final variant set.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Syndromic Intellectual Disability Caused by a Novel Truncating Variant in AHDC1: A Case Report

Mutations in the AHDC1 gene are associated with the Xia-Gibbs syndrome (XGS), a sporadic genetic disorder characterised by developmental delay, intellectual disability, hypotonia, obstructive sleep apnoea, dysmorphic facial features, and cerebral malformations with plagiocephaly. Here we report the case of a 13-year-old Colombian female patient with a history of developmental delay, speech dela...

متن کامل

Whole Exome Sequencing Reveals a BSCL2 Mutation Causing Progressive Encephalopathy with Lipodystrophy (PELD) in an Iranian Pediatric Patient

Background: Progressive encephalopathy with or without lipodystrophy is a rare autosomal recessive childhood-onset seipin-associated neurodegenerative syndrome, leading to developmental regression of motor and cognitive skills. In this study, we introduce a patient with developmental regression and autism. The causative mutation was found by exome sequencing. Methods: The proband showed a gener...

متن کامل

A software pipeline for the discovery of variations in exome sequencing projects

Motivations The recent advances in the technologies and strategies for DNA sequencing have dramatically facilitated the identification of novel human genes associated with rare and common diseases [1]. However novel methods are needed to identify high-quality variations among all the ones identified in a single experiment. The most successful approach to identify disease-causing mutations consi...

متن کامل

Exome Sequencing: Current and Future Perspectives

The falling cost of DNA sequencing has made the technology affordable to many research groups, enabling researchers to link genomic variants to observed phenotypes in a range of species. This review focusses on whole exome sequencing and its applications in humans and other species. The exome has traditionally been defined to consist of only the protein coding portion of the genome; a region wh...

متن کامل

NF1 Mutations Analysis Using Whole Exome Sequencing Technique in 11 Unrelated Iranian Families with Neurofibromatosis Type 1

Background Neurofibromatosis is an autosomal dominant disease. It affects one in 2,700 to 3,300 people. The main gene mutated in the disease is a tumor suppressor protein called neurofibromin. There are several categories, the most important of which is divided into two types of type I and type 2 neurofibromatosis. Here, we aimed to identify th...

متن کامل

Whole Exome Sequencing Revealed a Novel GJB1 Pathogenic Variant and a Rare BSCL2 Mutation in Two Iranian Large Pedigrees with Multiple Affected Cases of Charcot-Marie-Tooth

Charcot-Marie-Tooth disease (CMT) is the most common hereditary neuropathy of the peripheral nervous system with a wide range of severity and age of onset. CMT patients share similar phenotypes which make it often impossible to identify the disease types based on clinical presentation and electrophysiological studies alone. In recent years, novel genetic diagnostic approaches such as whole exom...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013